Topic Identification in Dynamical Text by Extracting Minimum Complexity Time Components

نویسنده

  • Ella Bingham
چکیده

The problem of analysing dynamically evolving textual data has recently arisen. An example of such data is the discussion appearing in Internet chat lines. In this paper a recently introduced source separation method, termed as complexity pursuit, is applied to the problem. The method is a generalisation of projection pursuit to time series and it is able to use both spatial and temporal dependency information in separating the topics of the discussion. Experimental results on chat line and newsgroup data demonstrate that the minimum complexity time series indeed do correspond to meaningful topics inherent in the dynamical text data, and also suggest the applicability of the method to query-based retrieval from a temporally changing text stream. The complexity pursuit method is compared to several ICA-type algorithms for time series.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

The Exact Solution of Min-Time Optimal Control Problem in Constrained LTI Systems: A State Transition Matrix Approach

In this paper, the min-time optimal control problem is mainly investigated in the linear time invariant (LTI) continuous-time control system with a constrained input. A high order dynamical LTI system is firstly considered for this purpose. Then the Pontryagin principle and some necessary optimality conditions have been simultaneously used to solve the optimal control problem. These optimality ...

متن کامل

A Novel Method for Detection of Epilepsy in Short and Noisy EEG Signals Using Ordinal Pattern Analysis

Introduction: In this paper, a novel complexity measure is proposed to detect dynamical changes in nonlinear systems using ordinal pattern analysis of time series data taken from the system. Epilepsy is considered as a dynamical change in nonlinear and complex brain system. The ability of the proposed measure for characterizing the normal and epileptic EEG signals when the signal is short or is...

متن کامل

An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification

Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...

متن کامل

Feature Extraction and Selection for Handwriting Identification: A review

Handwriting is a skill that is personal to individual [28]. The relation of character, shape and the style of writing are visually different from one to another. Handwriting identification is a process to identify or verify the authorship of a handwriting document. Asserting authorship identity based on handwritten text requires three steps: Data acquisition and preprocessing, Feature extractio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001